ISSS608: AY2021-22(T2): DataViz Makeover 2 Data Preparation Log

This page contains all data preparation steps needed to reproduce the Tableau Visualisation in DataViz Makeover 2

Melissa Tan (SMU MITB Analytics Track)https://scis.smu.edu.sg/master-it-business/analytics-track/curriculum?gclid=CjwKCAiA3L6PBhBvEiwAINlJ9EJwYxpaZv-zPxR0UMntDh37TrlWU7jwXP9Dcu9jvWvN8uEJsOWzTRoCqrQQAvD_BwE
2022-03-27

Data Cleaning on JMP Pro

Step 1:

Open the dataset “origin_destination_bus_SZ_202201.csv” provided for this task. Inspect for missing values and other data issues. We can see that out of more than 5 million rows, there are 29,085 missing rows for the origin fields and 26,975 missing rows for the destination fields.

Step 2:

Zoom into the missing rows to check. Use the information from the ORIGIN_PT_CODE field to backfill the missing data for ORIGIN_SZ and ORIGIN_PA.

One way to do this is to use the search feature on the Masterplanning page of the URA Maps website to type in the each ORIGIN_PT_CODE (bus-stop) number and identify the correct Subzone and Planning Area.

For reference, these are the missing information to look out for. It will not be wise to remove these missing rows because they contain data from populous areas and interchanges such as Yishun, Tampines, Woodlands, as well as arterial roads such as NEW UPPER CHANGI ROAD.

Ten rows were eventually removed as they were bus-stops located in Johor Baru, Malaysia, outside the scope of the task (46211, 47711)

Step 3

Create 2 new datatables by filtering out the dataset by DAY_TYPE (WEEKDAY AND WEEKENDS/HOLIDAY)

Creating the adjacency matrix visualisation

Step 4:

Create the OD matrix for the WEEKDAY datatable for each by using the split table function with the parameters organised as follows

Replace all null values with 0s. Repeat Step 4 for the WEEKEND datatable. Export both to Excel.

Step 5:

Open the WEEKDAY OD MATRIX in Excel. Create a new calculated field TOTAL TRIPS (ROW) by summing up each row.

Step 6:

Create a new worksheet PCT_WKDAY_TRIPS. Calculate the percentage of each cell by dividing the absolute number of trips in that cell by the row total * 100.

Repeat steps 5 and 6 for WEEKEND OD MATRIX.

Step 7:

Open WEEKDAY OD MATRIX Excel on Tableau. Create a new sheet and arrange the columns, rows, and details as shown below to get the adjacency matrix

Step 8:

To format the matrix, right click on the horizontal labels and select Rotate Label for the names of the planning areas to appear vertically.

For the Filters, use the Multiple Values Dropdown Menu option for a neater and cleaner look.

Step 9:

Repeat Steps 7 and 8 for WEEKEND OD MATRIX.

Change the colour of the matrix to brown to differentiate it from the WEEKDAY matrix. Adjust the title of each OD Matrix to match.

Creating the Chloropleth

Step 10:

For the WEEKDAY datatables created in Step 3, use the STACK TABLE feature in JMP Pro to pivot the dataset by organising the fields as below.

Label the new columns “Label” and “SZ_Name”. You will notice that the corresponding total trips has been calculated.

Export the file into Excel for further processing on Tableau Prep Builder. You will realise that due to the large file size, exporting the entire file will result in a truncated dataset.

Save the dataset in tranches of no larger than 300k rows that can be joined back in Tableau Prep.

Step 11:

Repeat step 10 for the WEEKEND datatable.

Step 12:

Connect to the “MP14_SUBZONE_WEB_PL.shp spatial file that contains all the map details of Singapore. The file can be found here on data.gov

Step 13:

Open the WEEKDAY datasets created in step 10 in Tableau Prep Builder and use the “join” and “add” functions to rebuild the dataset.

Drag the spatial file into the main frame. Use the join function to link up the 2 datasets by Subzone Name i.e. SZ_NAME from the WEEKDAY dataset and SUBZONE_N from the spatial file.

Step 14

Output the flow as an extract for use in Tableau Desktop.

Step 15

Repeat Steps 12-14 for the WEEKEND dataset.

Step 16

Open the extract for WEEKDAY on Tableau. Create a new sheet and arrange the columns, rows, and details as shown below to get the chloropleth

Step 17:

Rename the options in the LABELS filter by changing the Aliases to OUTBOUND and INBOUND as shown here

Step 18:

Format the legend by choosing the single-value drop down option for the Origin PA filter, and the single-value list option for the IN/OUTBOUND filter.

Click on the Colours option in the Marks menu and adjust the opacity to 80%

Edit the tooltip to show the following information

Step 19

Edit the Title to make it dynamically change according to the filter options selected by inserting the fields as shown

Creating the Bar Charts

Step 20

Create a new worksheet for the bar charts. Arrange the fields in the appropriate columns, rows, filters as shown below.

Step 21

To link the ORIGIN_PA and Label filters for the bar chart to the same filters in the chloropleth by right-clicking on the relevant filter >> Apply to Worksheets >> Selected Worksheets

In the dialog box that appears, make sure that the two desired sheets are checked. Now the 2 filters will be linked and any change to the filter options in the chloropleth will be reflected in the bar chart.

Step 22

Add a reference line by right-clicking on the y-axis and setting the options in the dialogue box as Average, and Label >> Show Value.

Step 23

Format the bar charts by narrowing the width of the bars (Select Size from the Marks menu), tidy up the tooltip and the labels of the axes. The final bar chart should look something like this

Step 24

Repeat steps 16 to 23 for the WEEKEND dataset. Change the colours of the charts to Brown to differentiate it from the WEEKDAY charts

Step 25

Link the ORIGIN_PA filters for the WEEKDAY and WEEKEND matrices by right-clicking on the filter pill >> Apply to Worksheets >> Selected Worksheets. Ensure that the 2 boxes are checked.

Creating the Story

Step 26:

Combine all the visualisations into a single workbook by cutting and pasting the 2 sheets with the adjacency matrices to the workbook containing the chloropleths.

Step 27:

To create the first tab in the visualisation create a new dashboard “MAP VIEW” and drag the 4 sheets containing the WEEKDAY and WEEKEND barcharts and chloropleths into the main frame.

At the Size section on the left menu bar, select “Automatic” so that the visualisations will automatically scale to the size of the viewer’s screen.

Step 28:

To create the second tab, create a second dashboard “MATRIX VIEW” and drag the 2 OD matrices into the main frame.

At the Size section on the left menu bar, select “Automatic” so that the visualisations will automatically scale to the size of the viewer’s screen.

Since the filters are linked, only one set is required.

Step 29

Create a new Story “Exploring Bus Trips in Singapore (January 2022)” by dragging the 2 dashboards to the tabs in the main frame.

Step 30:

Publish the Story to Tableau Public.

The final visualisation should look like this:

Map View Tab

Adjacency Matrix Tab